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ABSTRACT 

Research procedures used in the development and 
validation of R/EAL (Reading/Everyday Activities in Life) ^ a new test 
to overcome problems in assessing functional literacy among 
adolescents and adults, are described* Specific objectives of the 
study were to: (1) provide information about the design end 
develo^ent of F/EAL, including determination of reading criteria, 
establishment of task analyses, production and selection of 
individual test items, development of test formate and procedures, and 
completion of the final version of R/EAL; and (2) provide information 
about the validation of R/EAL, including procediures and data on 
reliability and validity for a select sample* Following field testing 
of the R/EALr a revised version of the instrument was developed, 
which contains reading criteria selected from covmon daily reading 
materials; includes tasks related to each criterion developed based 
on the reading functions required to deal with the individual 
criteria § and presents all directions and questions in aural form via 
individually operated cassette players. The methodology used for 
validating R/)(E?IL, including sample sc^lecticn, testing procedures, and 
item analysis, is described, and data relating to individual item and 
total test statistics, factor analysis data, reliability, and 
validity are reported. The reliability and validity figures on R/EAL 
tend to support its use as a viable assessment instrument for 
functional literacy. Item analysis figures show a difficulty level of 
items for the total group raining from .35 to •S? with a median of 
«60. (DB) 
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THE DEVELOP^ffiI^TT AND VALIDATION OF R/EAL, 
AW INSTRUMENT TO ASSESS PUNCTIOlvfAL LITERACY'' 
Marilyn Lichtmari/ Ed^D. 
The Catholic University of America 
Recently, interest in literacy in the United States 
has undergone serious changes in emphasis and dir'^ction. 
Increased emphasis on the extent of illiteracy v/as spawned 
by Allen's Right to Read speech of the sixties resulting in 
an ever^-increasiny public awareness that some 18^ million 
adult Americans remain basically incapable of performing sim- 
ple tasks involving minimum reading skills. Federal, state 
and local efforts have been mounted to deal with the pro'blem. 
Right to Read Councils are mushrooming across the country in 
both schools and communities. A renewed interest in reading 
and literacy has been taken in Congress (See S-1318 ''The 
Elementary School Reading Emphasis Act of 1973," a bill 
sponsored by Senators Beall and Dominick) . A recent survey 
of parents in the state of Maryland indicated that reading 
was the primary area in which the schools should concentrate. 

Literacy Assessment 
One area of particular concern is the assessment of 

- Parto of tan researcli prcs^iiitad herein were perEcrniewl pursuanL to 
resocirch gracitis sunport^id by tlie CaLhoiic Unii varsity of A::-3rica and by 
Job Corps, Department or Lauorj JCC lio. 32yA-'-/9. » 
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lit'ciracy. Traditionally literacy aasessraents were equated 
with measures of reading achievement. An incii vidua! v;as 
judged to be functionally literate if he received a grade 
level score on a standardized reading achievement test uf 
anywhere from fifth to eighth grade level. Authorities differ 
uS to which of these grade levels should be equated with func- 
tional literacy; but whatever the score, the measurement of 
literacy involved a standardized reading achievement test 
(usually designed for the elementary school child) and the . 
assignment of a grade level score with norms developed based 
on performance ot elementary school children. 

The content and format of such teats usually fit the 
following pattern: reading comprehension is measured v/ith a 
number of relatively short paragraphs, usually gradedin diffi- 
culty, followed by three or four questions designed to tap 
such skills as determining the main idea, noting details, and 
the like. The content of the paragraphs represents a range of 
interests suitable for a predominantly elementary school age 
child; practical reading tasks are usually not i-icluded. The 
student reads the paragraphs and responds to the questions 
by selecting one correct answer from a set of four and marking 
the appropriate answer blank. By implication^ then, functional 
literacy, if measured by one of the standardised tests described 
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obova, would be defined aa the grade level scoro received 
on a test co.T.po3ed of such content:. 

It is Guggested that testa such as these nre not :;uita- 
ble .T.easures of functional literacy, especially with adults 
and minority group members. The validity question is here 
considered in three aspects: the content of the tests; the 
format of adminxstrationr and the use and interpretation of 
scores. 
Test Content 

An examination of the content focuses, firstly, on the 
relationship between such content and content which might be 
considered more suitable or representative of the domain of 
functional literacy. Harris (1971) sugoested that the focus 
of functional reading ability should be on reading skills 
required to cope with everyday experiences. Harris's two 
sur'/eys (1970 and 1971) were built Oi this philosophy and 
contained practical reading content. In the development of 
the National Assessment material some emphasis was placed on 
including content which represented more vjractical aspects 
of reading^ (National Assessment of Educational Progress, 
May 1972) - Interest in practical reading material has like- 
wise been expressed by such central figures in reading as Ruth 
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Love Holloway, Director of th-^ i>icjht to Road Program the 
Office of Education, and NathaniGl Dixon, Executive Director 
of the Nation*?! Reading Center. Although a firm definition 
of functional literacy has yet to be agreed upon, all indica- 
tions are that emphasis will be on the performance of reading 
tasks directly related to practical real life experiences. It 
is thus suggested here that tests whose content is not repre- 
sentative of such practical real-life reading are inappropriate 
for the measurement of functional literacy. 

Further, it is suggested that the content of many of 
the reading achievement tests is unsuitable for use with youth 
and adults. Such content is often child oriented and young 
adults are often poorly iTiOtivated to respond to what they 
consider to bo a test "beneath" them. A somewhat related 
problem concerning the appropriateness of content is its 
suitability for minority group members. Surveys have indicated 
that large numbers of the illiterate in the United States are 
members of minority groups. Recognizing inappropriateness of 
tontent# Harcourt Brace has indicated that they are rcviev^^ing 
their tests with the view towards "soliciting their reactions 
^ITLacks, Spanish-Speaking Americans, and others v/ho nr-^ fami- 
liar v^rith the needs and styles of pupils from a variety of 
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minority background^ to ^.;y content which might ba^ uninten- 
tionally, inappropriate or offensive for such children/* 
(emphasis c?dded) (Pitrgibbon, 1973, p. 3) , The American 
Psychological As30ciation, in th^ir third draft of the Stan- 
dards for Development and Use of Educational and Psychological 
Tests , cites social ills attributed to tests as one of the prime 
motivators in the revision of their Standards. They express 
concern with such areas as "failure to choose an appropriate 
test,** (p. II). 

Closely related to the type of information presented 
in Lhe test is the physical appearance of such information* 
Thus, not only should the content be drawn from practical 
reading experiences, it should actually be as close in appear- 
ance to the act ' material as possible. It is unclear what 
influence the physical layout of the material may have on a 
reader^ but it is suggested that a representation which approxi- 
mates the form in which the reader is likely to encounter the 
material in his actual reading provides a more accurate pic- 
ture of the reader's true performance. For example, if one 
wanted to measure an individual's ability to read and inter- 
pret a lease, a facsimile of an actual lease rather than the 
content of the leas a in some modified or rewritten version 

O 
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should ba presented. It would seem that this approach repre- 
sents a more accurate approximation of the actual tas.<. face 
validity nay have considerable effect on the student* ^ atti- 
tude towards taking the test and the teacher's attitude towards 
interpreting its meaaingfulness in the assessment of functional 
literacy. 

Format of Administration 

Test administration format may also affect student 
motivation to responc'. Traditional reading tests may actually 
pose a threat for the poor reader or for the adult who has 
frequently faced failure in test situations. It remains 
unclear what effect the authority figure may have on student 
response^ but it is suggested that such authority figure might 
create a feeling of anxiety which could adversely affect the 
student's ability to perform. Poor readers do not like to 
reveal their deficiencies, especially in group situations. 

Most reading achievement tests are power tests, i.e., 
speed is not generally a factor influencing student performance. 
However, almost all reading tests are timed and the amount of 
time allotted for test completion may be insufficient for the 
poor reader. Thus an undue bias may result from administering 
a test in a group situation and imposing a time limit. 



The test taking situation and the appearance of the 

tost may have a negative effect on utudant pc-r formance. It 

has been suggested that students are ••turned off" to taking 

ft 

tests, that such negative feelings may cauiie them to perform 
less well than they might in the actual situation. Thus, many 
tests, which are actually supposed to be samples of student 
performance, may not^ actually represent unbiased samples of 
tasks. Responses to such negatively viewed tests may not 
adequately reflect the student's true performance. 

Response mode may also present bias in the testing of 
poor adult readers. If one is desirous ol knowing a student's 
ability to perform a given regc^ing task, then his actual per- 
formance should be measured — his output — rather than his ability 
to select one item from a set of four or five. The multiple 
choice format most frequently used in the testing of reading 
may actually prove to be an unreliable measure of an adult's 
literacy capability. The National Assessment materials and 
procedures support this approach as do some materials developed 

by fiumRRO in their tests to assess job related skills, (Nat- 

* 

ional Assessment, 1972, and Sticht, 1972). 



8 

iJsa gad Inbarpretg tion ol: Scoces 

A bhird area of concern in tho assessmeat of functional 
literacy involves the use and interpretation of test scores. 
For the most part measurement of literacy has utilized novm 
referenced procedures whereby an individual's performance on 
a test is reported in terms of his relationship to others who 
have. taken the test. One primary aim of norm referenced tests 
is to determine maximum discriminability among individuals and 
items are included v/hich v/ill maximize this discrimination. A 
grade level score refers to the degree to which an individual 
performs relative to others; it does not indicate the degree 
to which an individual performs relative to a standard or cri- 
terion of mastery of a given task. In literacy assessment, 
the question to be asked is whether or not an individual has 
mastered a sufficient amount of reading tasks v/hich are re*pre- 
sentative of functional literacy. Reference is made to the 
amount and type of material mastered (i.e., whether or not the 
individual has achieved criterion) rather than how well he 
responds compared to others. The important consideration 
is whether or not the individual has demonstrated maste:.y of 
the stated tasks. 

Other difficulties also occur with the use of grade 
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level scores in roporcing per£orn\ance o:: adoleacLinc and aduic 
groups. Many reading achievement tests ware normed on elemen- 
tary school children. Little meaning can be attached to the 
statement that an adult receives a grade level score of 4.2. 
This COL' Id mean either that he reads two months better than the 
average fourth grader at entrance iato school or that he reads 
fourth grade material successfully but does not read fifth 
grade material as well. In any event he is being compared 
with fourth grade children on inappropriate material and thus 
the score's meaning is questionable. 

Purpose of- the Study 

This study describes the research procedures used in 
the development and validat:wOn of r/EAL^ (Reading/Everyday 
Activities in Life) , a new test developed by the author to over- 
come some of the specific problems discussed above in assessing 
functional literacy among adolescents and adults. Specific 
objectives of this study are to: 

1. Provide information about the design and development 
of r/EAL / including determination of reading criteria, estab- 
lishment of task analyses, production and selection of indivi- 
dual test items, development of test format and procedures, and 
completion of final version of r/KAL. 



ERLC 



iO 

2. Pro'^ida information ctbout cne vaiiaatioa us: R/'ZAL , 
including procedures and data on reliability and validity for 
a select sample* 
Design and Development of R/eAL 

The following organizational scheme was followed in 
the construction of r/EAL: initial preparation and construction 
of the instrument^ preliminary field testing, item analysis and 
selection, and production into its present form. 

During the initial .preparation and construction of the 
instrument^ the author v/as guided by a number of conditions 
v;hich attempted to overcome inadequacies of the tests. Firstly, 
the content must be representative of activities which could be 
considered directly related to practical life reading experi- 
ences. In addition, the content must be suitable for adoles- 
cents and adults. Further, the content must be presented in 
such a fashion as to closely resemble the appearance of the 
material as it usually is found. Since empirical information 
documenting frequently read material was not available at the 
time of the initial test development, the author selected 
materials and activities based on a logical and common sense 
approach. The identified areas included sets of directions, 
. applications, technical documents, etc. ^ 

S ^Gondly, it was decided to provide a test format that 
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would bij luo tivaliing, eliiainate cidrainia br j 'cor bxuu, have no 
time limits, and eliminate group test taking situations* 
Furthermore, many situations in which adolescents and adults 
are measured for functional literacy may not have trained best 
administrators, or may not have regularly scheduled classes 
v/hich makes the usual group test administration procedures 
difficult. Also, the control of the test taking situation 
directly in the hands of the individual 'being tested appeared 
to be particularly desirable for ,.,adolescen ts and adults. As 
a response to these conditions, it v/as decided to use an indi- 
vidually controlled audio cassette input for all test direc- 
tions, information and questions, and a booklet presenting 
facsimiles of various reading materials. In a study of self 
concept comparing audio-tape administration versus teacher 
administration, Giguere and Baker (1971) " ... found that a 
reclicable method of test administration not subject to teacher 
mood or preparation will provide more representative data than 
one that varies from test occasion to test occasion^" (pp* 9-10).. 

Thirdly, mode of student response needed to be 
determined. Questions which utilized a multiple response 
mode, while easier and less time consuming to score, do not 
"allow the individual to demonstrate his ability to actually 
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perform •::'2rtciln tasks associ-atad v/ith h'r/i; jCoaLor.t . 'PhiiV nay, 
in fact, only reveal the individual's c/bili.ty to raake intelli- 
gent choices from among 'a set of al terha tives , The decision 
was thus made to use the student constructed, or open-ended, 
response mode on the assumption that it more accurately repre- 
sented a student *s ability to perform certain tasks* 

Finally, the important decision v/as made to construct 
the test following procedures recommended by Nitko (1971) . 
He suggests, in referring to the de^r-lopment of criterion- 
referenced tests that classes of behavior be defined, a set 
of test situations be specified; a representative sample of 
tasks be selected, and that the obtained score be capable 
of expressing the individual's performance characteristics 
in the classes. Task Analyses detailing the terminal objec- 
tive and each of the enabling objectives necessary in the 
reading of a reading^ material were developed. Figure 1 is 
an example of one such analysis. 

Insert Figure 1 About Here 

Norm-referenced tests were considered unsuitable 
since they would not provide an indication of the student's 
ability to master functional literacy tasks, but rather his 
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cibiiity in relation co others. Cri tarion-ref erenoacl, or 
content-referenced, tests provide information about how well 
the student has mastered the content of the test. The score 
interpretation was to be made directly to mastery of a pre- 
determined cutoff point relating to a set of objectives* 
Glaser and Nitko suggest that 'V. criterion-referenced test 
is one that is deliberately constructed to yield scores that 
are directly interpretable in terms of specified performance 
standards/' (in Nitko, 1971, p. 3) . 

Preliminary field testing was conducted during the 
academic year 1971-1972 on approximately three hundred indivi- 
dauls located in the Washington, D- area. The^ majority of 
the group were inner city disadvantaged high school students, 
although some seventh and eighth grade suburban students, and 
some students attending adult basic education classes were 

• also included. The purposes of this field testing were (1) 
to determine if the content and format proved interesting, 
practical, and workable and what changes needed to be made? 
(2) to determine which questions needed to be eliminated or 
modified because of ambiquities, lack of clarity, and the 
like; (3) to determine which reading content and correspon- 
ding questions should be used in the revision of R/eAL. 

O 
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For the iriOst part, the proceduras followed during the 
field testing were the same for all groups, although a few 
modifications had to be made depending on. prevailing condi- 
tions in schools. Potential subjects were identified, arran- 
gements made with appropriate officials for testing, and 
actu.al testing conducted. Since the test was administered 
via individually controlled tape recorder, the assembly'^of 
equipment presented additional problems, but these were over- 
come by the purchase of sufficient equipment. In most instances 
students were tested in small group situations with a member 
of the project staff present. Each student operated his own 
recorder and paced the input according to his needs. in a few 
cases one recorder was used for a small group and the input 
was paced by demands from the various group members to "repeat" 
or "stop." This procedure was abandoned early because it 
appeared to result in additional problems. In a few cases 
students were tested on an individual basis by a member of the 
project staff. . 

Since students were tested'^n a variety of situations 
it was not possible to obtain information about their reading 
ability, except gross judgment by teachers, or counselors* For 
example, if a student was enrolled in an adult education pro- 
gram- it was assumed that he" had difficulty reading. In most 

ERLC 



15 

cases d taadardised tesb scoras ware unavailable and i"c was 
decided to forego a previously anticipated plan to explore the 
relationship between P>/EAL test performance and performance 
on another reading test. Such steps were undertaken, how- 
ever, during the validation phase. 

• The data collected during this field testing wer-e 
subjected to -a variety of statistical procedures/ including 
. the computation of p values or the proportion of students 
passing each item; means and standard deviations; and factor 
analyses. Since r/EAL was designed as a criterion-referenced 
test, the usual procedures for selection of items could not be 
follov/ed. That approach would result in the use of items which 
would provide maximum discriminability of students rather than 
the use of items which would tap the predetermined content. 
For example, it might be desirable to include an item which 
measured one of the basic predetermined objectives yet that 
item might have a p value that did not provide maximum dis- 
criminability. In discussing the development of objective- 
based tests Giguere and Baker (1971) indicate a desire for 
but lack of procedural guidelines for the interpretation of . 
data^ (p. 10). Others assume that if an item measures an 
objective that is sufficient reason for its inclusion in a test. 
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Sinc::2 praci3e guidelines for cha selection of items for a 
criterion-refGrenced test are vague, the author decided to 
combLne both a logical and empirical apprpach to deterniine 
the desired items. 

Based on the information gleaned from this field 
testing phase, a revised version of r/eaL was constructed. 
Revisions included (a) lengthening the number of reading 
criteria presented in one booklet from seven to nine thereby 
lengthening the test from a thirty-five item to a forty- five 
item test/ a procedure designed to tap both additional content 
areas and to increase reliability; (b) eliminating or modifying 
some questions which appeared' to have unusual response patterns 
(e.g., the lower half of the distribution did better than the 
upper half) or v\7hich did not directly relate to the predeter- 
mined objectives; (c) shortening the initial directions 
I ■ 

presented on the audio cassette since informal observations 
revealed that such directions Tdid not add to students' under- 
standing of the tasks to be done; (d) altering the reading 
criterion in a few cases by decreasing . the length of the 
material presented, eliminating ambiguous portions, or adding 
additional information such as a title or heading; (e) pro- 
ducing the audio input by a trained professional in a sound 
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srudio to .elirainats extraneous sound or dif ficu.luies in com- 
prahending the spoken word- 

Thus the present revision of r/EAL (a) contains read- 
ing criteria selected from such common daily reading activi- 
ties as food store advertisements, directions for preparation 
of food, want ads, leases, maps, etc.; (b) includes tasks 
related to each criterion developed based on the reading 
functions required to deal v/ith the individual criteria. Such 
tasks have been translated into specific questions following 
guidelines recommended by Davis (1971) , ''item analysis data 

Zyerd7 used as a basis for refining the items through 
insightful editing, but the use of item-test correlation 
coefficients or difficulty indexes /y^^^ not allowed to. 

affect the validity of the test by distorting the proper repre- 
sentation of behavior categories . . " (p.l) ; (c) presents 
all- directions and questions in aural form via individually 
ope^rated anssette players. 
Validation of r/EAL 

This section deals with the methodology used for 
validating R/eAL, including sample selection, testing proce- 
dures, and item analysis. Data relating to . individual item 
and total test statistics, factor analysis data, reliability. 
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and validity are reportad.' 
Methodology- 
Sub jecfcs . The subjects selected for the validation 
sample were students enrolled in a reF?idential manpower 
training program servicing disadvantaged youth ages 15 to 21. 
Most of the enrollees were high school dropouts, many of whom 
lack basic skills in reading. For the most part^ Job Corps 
enrollees are Black/ Mexican-American, rural white, Puerto 
Rican or American Indian. To be eligible for the program 
their families must be at the poverty level. 

Four Job Corps Centers, two male and two female, 
were selected for participation in the program* (Job Corps 
has only a very few Centers v/hich are coeducational.) The 
Centers were selected to represent the various groups served 
in the Job Corps program. Their geographic locations included 
sites in New Jersey, West Virginia, New Mexico and Texas. Total 
numbers of enrollees varied from Center to Center. 

Selection of subjects at each Center varied depending 
on conditions existing at the Centers during time of testing. 
In tv/o Centers a random sample (one computer-generated) was 
identified- An attempt v/as made to select a random sample in ■ 
a third Center but this v;as not entirely possii)le. Because of 
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conditions at the fourth Canter whole classes of enroiiees 
were used for testing. Although procedures for sample selec- 
tion varied, it did not appear that any known bias was intro- 
duced in the selection of subjects. Since students in tv\^o 
Centers were of Mexican-American descent, a restriction was 
placed that only those fluent in English could participate. 
(Another study was conducted with information transmitted 
in Spanish, but is not the subject of this paper.) Table 
1 reports the sample size for each Center, the total sise, 
and the distribution by sex. 

Insert Table 1 About Here 

Tasting Procedures . At all Centers testing v/as super- 
vised by the author, with the assistance of one or two others 
on her staff. Centers also provided personnel who were avail- 
able to assist VNi^ith equipment,, scheduling, and the like. 

All enroiiees v/ere tested in a special room designated 
for testing. About twenty-five students were tested at a time. 
Each student had his ov;n individually controlled and operated 
cassette recorder and earphone and was permitted to work at 
his own paceo Upon completion of the test he returned to his 
normal pursuits and a new enrollee took his place. Thus testing 



ERLC 



20 

proceedad almost on continuous basis (except for a lunch 
break) . In soma Centers over// a hundred enrolLees were tested 

in one day. ^^'^^^^^^^ 

Directions for all students v/ere the same and consisted 
only of explaining the use of the tape recorder and earphone; 
prior to the actual test date, however, students had been told 
that they would participate in a new type of liheracy testing. 

Student responses were scored correct or incorrect 
according to a predetermined objective scoring key. Partial 
or ambiguous responses were judged incorrect. All scoring 
was carried out by the author or members of her staff. 

Data Analysis . All pertinent enrollee data were trans- 
ferred to IBM cards for use in the data analysis » All statis- 
tical analyses were computed on an IBM 360 computer. 

Magnuson (1957) , in discussing the usefulness of item 
analysis procedures, indicated the relationship of item analysis 
techniques to the questions of reliability and- validity. He 
suggested that the dependability of an obtained score is *'an 
estimate of his true score (i.e., the reliability of data), 
which determines the value of the test. The reliability and ^, 
validity of the data depend on the properties of the individual 
items which m.ake up the test,." (p. 197) <. Item analysis statis- 
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tics >/tir3 Gonipated oa availcibla data. These inciudacl p v/alues 
(proportion of subjects passing an item) for total group, 
upper half, and lower half; inter-item correlations, point 
biserial correlations, and factor analyses. 
Results 

This section provides specific information obtained 
from the above-dascribed sample. 

Item Analysis , The information provided in Table 2 

Insert Table 2 .About Here 

indicates the difficulty level of each item in the test for 
the sample described above* A high p score reflects an easy 
item; a low p score the reverse. If we are concerned v/ith 
discriminability of items, a high p score v/ould be a poor 
indicator of discriminability. But this test is concerned 
with mastery to a predetermined set of objectives. Thus, a 
high p score would suggest that a large number of the group 
tested had mastered that item. 

Additional information about difficulty levels of 
individual items can be obtained by examining the proportion 
of the upper half of the distribution that passed a given 
item and the proportion of the lower half that passed the same 
item. If the item is sound, it would be anticipated that a 
^cTher proportion of the upper half than the lower half v\rould 

uc 

s the item. In addition to examining the item difficulty. 



iten analysis procedurab alao call lior c:orrelcitior;o or itemB 
with total test scores. Guilford (1965) suggests that item-te 
correlations are more important than difficulty of individual 
items because they indicate v/hether or not a test item discri- 
minates in line V'/ith other items in the test. For this type 
of analysis Guilford suggests the use of a point biserial 
correlation. Magnuson (1967) suggests that the magnitude of 
the point biserial correlation is greatly affected by the 
difficulty level of an item. This results in "very easy or 
very difficult test items (having) systematically lower coef f i 
cients for the correlation with the test than items of medium 
difficulty^" (p. 209). Recognizing that the difficulty level 
of a test item affects the correlation, caution in its irTter- 
pretation must be exercised. Again, it must be emphasized 
that certain items which were logically included based on thei 
relationship to the predetermined tasks and objectives may 
be at the extreme difficulty levels.- Table 3 reports the 
point biserial correlations between individual test items 
and total score. 



Insert Table 3 About Here 



■\ddit. ional statiBtics about the i. ketns -wero - 
i A pra* ' cali-.'. r th'-? i.'Or felat lun of item^i wi^r- ol\-jr .\ 
tost, rabie 4 reports these correlations for f.oj-. 
i Lemp a&e6 in. K/EhL. In interpreting these mtarccrreXcit 

Insert ^I'able 4 About Here 

it should be noted that a number of aspects of reading are beinn 
measured and that reading is not viewed as a unitary trait. 

Needing additional information about the relations!* 1 = 
of each item to the others necessitated a factor analynxs to 
be performed* A principal components analysis with a varimax 
rotation program was utilized. These calculations yielded 
three factors v/hich accounted for 100% of the common varian.^:, 
but onTy 3 2% of the total variance: Table 5 gives the iaifor/i- ■ 
tion obtained in the analysis. 

\ 

Insert Table 5 About Here 
^ . ^ 

Rel iability . Estimates of reliability were calculated 
for r/EAL and are reported herein. *A number of procedures^ 
have been developed for estimating the reliability of a test. 
In particular/ 'special procedures for estimating the ^ ^ -^^. ^ _ 
. reliability of a contents-referenced or mastery test have been 
proposed. These procedures are still in experimental stages, 
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however. Livingston (1972) ofrarad a reliability coerf icient 
based on deviations of scores from the criterion score rather 
than deviations from the mean, Harris (1972) , in response to 
Livingston's work suggested that " ... his work fails to advance 
reliability theory for the special case of criterion-referenced 
(content-referenced) testing," (p. 29)- Marshall (1973), too, 
suggested difficulties with Livingston's coefficient. Rather, 
he offered additional information related to the methodology 
of determining reliability of criterion-referenced tests. 
He suggested three indices to be used in the estimate of relia- 
bility: index of efficiency, index of sensitivity of instruc- 
tion and index of separation. Since Marshall's work is still 
highly experimental, however, it was decided not to purcue 
these coefficients at this time. 

Fluidity of thought concerning acceptable procedures 
for estimating reliability of mastery tests had led the 
author to select classical internal consistency measures for 
estimating the reliability of r/EAL. Kuder-Richardson 20 
(iCR20) procedures were e.uployed to provide measures of "both 
equivalence (of items) and homogeneity," (Anastasi, 1961, 
p. 122). "Table 6 offers the results of the calculations. 
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Validity 

Questions of validity need to be considerGd in deter- 
mining if a test ir> appropriate for a particular use. Vali- 
dity refers to the degree to which a test actually measures 
what it purports to measure^ (Anastasi, 1951^ p. 29).. Fur- 
ther, validity coefficients provide a check on how well the 
test fulfills its function. 

The APA Standards (1973) also considers the question 
of validity of a test. In their terms validity is concerned 
with the accuracy of the information that can be inferred 
from the test score. The measuring instrument "is an opera- 
tional definition of a specified domain of skill or know- 
ledge, " (p- VI) . Information related to the operational 
definiti.on of functional literacy has been supplied earlier 
in the discussion of the rationale and development of r/eaL. 

In the discussion here, two types of validity will 
be considered: criterion-related validity and content 
validity. Criterion-related validity refers to the relation- 
ship of this test to some other (external) criterion designed 
to measure the same function. Criterion-related validity 
was determined by selecting a standardized reading achieve- 
ment test and computing a correlation between r/eAL and the 
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raading achiavemenr test:. For L;ie target ^jopula'-ioa das- 
cribed above, the Stanford Achievement Test was selected. 
The Pearson Product Moment correlation between the two 
tests was .74 (n=434) and the standard error of measurement 
was equal to 5.28. 

Content validity is another suitable area when 
exploring the validity of R/EAL. A demonstration of concent 
validity must show that the behaviors sampled in the test are 
a representative sample of behaviors from the universe of 
behaviors. Giguere and Baker (1971) suggest that the validity 
of criterion-referenced tests "does not depend on a series of 
highly related correlations but rather on the user's accep- 
tance of the specified premises upon which the instr^iments 
are based^ " (p. 2). Reference is made to the Task Analysis 
at page 29. Each question used in R/eAL is selected 

directly from the Task Analysis and is so designed as to 
represent as much of the' domain of tasks as is possible. 
These Analyses specify objectives of the test and indicate 
how the "component tasks make up the total domain/"^ ( Standards ^ 
1973, p. VII) . 
Implications 

The need for a test concerned with- the practical 
application of reading in daily life is great* r/EAL 
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ott3iapt3 to overcome some of the problanio inherent in tests 
which are presently in use. Reliability and validity 
figures on R/EAL tend to support its use as a viable assess- 
ment instrument for functional literacy. 

Item analysis figures show a difficulty level of 
items for the total group ranging from .35 to .97 with a 
median of .60. Since items were designed to reflect pre- 
determined objectives and since it was known from other 
information that the sample reflected a range of reading 
abilities, this variation in response to individual items 
would be anticipated. An examination of differences between 
the upper and lower half of the distribution reflects, as 
anticipated, that the upper half had mastered more of the 
tasks than the lower half. 

The point biserial correlations suggest that some 
items are more closely related to the total score than others; 
but the restrictions suggested by Magnuson must be taken 
into account in interpreting these correlations. The inter- 
correlation matrix (Table 4) and the factor analysis (Table 5) 
lend further support to the hypothesis that functional literacy 
is not a unitary trait and may be influenced by such factors ' 
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as the contanc or format of cha nr^dtariai- Lci acdition, the 
factor analysis data seem to suggest that at least three 
separate factors are measured by r/EAL. Items in Factor I 
come primarily from tasks relating to reading road maps and 
road signs- Factor III items are ail from the job applica- 
tion. The heaviest loading is on Factor II and represent 
all other items from the test. It is interesting to note 
that/ in a related study^ the Stanford Achievement Test 
split fairly evenly on Factors I and II. 

Areas for future investigation are of special 
importance in a test ot this nature. Research relating to 
pre-test, post-test differences is currently being completed 
and will be reported subsequently^ Additional research is 
also underway using R/EAL with a population of deaf students. 
In that study the audio portion is being translated into a 
videotaped total communication presentation. Additional 
research with other populations and age groups is also being 
considered. The use of r/EAL as a diagnostic/prescriptive 
instrument was part of the original design but empirical 
validation of these procedures still needs to be undertaken. 
Finally/ equivalent forms of R/EAL and the use of additional 
reading criteria are being developed. 

ERLC 



M 

o 









0) 






x: 








tn fd 




e 


c M 




% 0 0 








C ^ 


C OJ 






0 x: 










tn o 


<o " 




OJ 


Q) ^ 






M tn 






M (d 




0 w 


0 






U OJ 




0 


e 






0) fd 




0 






g (U 


0 






0) 




• 0 ^ 
















U4 a) 


tn 




















i\} rH 


<d 






(U m 


• 




CO 








u 


3h iH 


CO) 


tn 0 






C 


e 


>• rd 


0 - 






e 0 


4-) 




fd 4J 











W 






C 




fd 


tn 0 


0) 






c 




cn 


•H 






B 


fd 




U 


u 




0) 


0 


jJ fd 


4J 


> 


C ^ 


Q) 






TJ 








x: 






-P 


r rH 


C 






fd 




O -H 




0 




Q) 








tn 


0) Sh 


«J 


c 


01 QJ 


c 








c 


fd fd 




fd 




OJ 




u 




(D 


o 




> <u 


tn 


(U 








o u ^ 


4J 



N 0) 












rH H C 




D rH (1> 




^ -H e 




03 ^ tn 




U 'V 




O U p 




> <u 








u id fd 




•HQ). 






c 


•H M 


-H 


V Q) fd 


C 


ox e 


fd 




<u 


w • o 


e 






C. Of 




<U 'O <u 


p-> 


> M fH 


0 


•H O rQ ri:* 

O ^5 Id m 



ERIC 









a 


c 




tn 0 


Jh 


'H 4J 


0 


W 




(U 




TJ rH 




0) Xi 




-P rd 


fd 


c 


tH 


•H <U 








fd 


rH 


o 


^ rH 


o 


O -H 


> 


• :^ 




M 


x: . 


0) M 


o" 


cn (U 


fd 


'D 


■^"^ 


fd Id 




0) 


CJ 


c: M 'd 




0 


> 0 


u 


•H .C 


QJ , 


CD .U 







H 






•« 






«H 






ai 












r-j 






> iH 






0 


rH 


fd 


0 






X 


0 


0 










,Q 


u 


'!> 








f*j 




'0 >< 






tn 


'H ^ 




fj 0) 








rH to 


0 




•U 




CO 


•H 




>H UJ 


0 






S 0) 


4J 


4J 


c 












0 


<D 


QJ fd 


fd 




H 




ti) 










dJ 




4J -H 


a 








m 


mi 
H 


n 




< 


nj 

«ll ^ 


C 


cn M 


U-4 




c 


U -u 




c u 








c 


> 


tn w 


QJ 




> 


0) 0) 


•H 


•rJ -H 


C 




•H 








0 






-U -H 





13 






U 






0 










tn 






c 






tn 0 






•H 


fd 






rH 




QJ 






^3 rH 






0) ^ 


fd 




*P fd 


a 




c 


0 




-H QJ 


> 
















rH 


c 




rH 


0 




0 -H 




. • 




QJ 


M 






OJ 


QJ :^ 


fd 


iC 


0} QJ 


c 




'X) 


•H 


O 


fd fd 


£ 


c 


QJ 


•H 


fd 


c M 






Q) 


o 


E 


> (U 




0 


•H x: 


•H 


u 


CD ^ 


13 



Table j 



Sample size by Center & Sex 



Center Size* Sex Size 

1 98 Male 169 

2 101 Female ' 255 

3 154 

4 71 

Total 434 434 



*To a general degree these sizes and sex distributions reflect 
a proportion of the total size and sex distributions at the 
Centers. 
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Table 2 



Item Difficult;/ for Total 

Group, upper and Lower '■Jalvea ( n = 43 4 ) 



Item No, 


Pupoer 


Plower 


Ptotal 


1 
2 
3 
4 
5 


,99 
.84 

.98 

Q P 

.80 


.78 . 

.53 

.77 

,79 

,37 


.88- 

.71 

.88 

.89- 

,58 


6 
7 

8 

9 

10 


.99 
.66 
.55 

.53 


.76 
.46 
.26 
.21 
.13 


.87- 

.56 

.41- 

.36 

.35- 


11 
12 
13 
14 
15 


.94 
.93 
.72 
■ 00 
.62 


.62 
.49 
.29 
.22 
.14 


.78- 

.71 

.50 

.45- 

.38 


16 
17 
18 

20 


.86 
.82 
.81 
. 50 
.71 


.49 
.45 
.40 
.17 
.30 


,68- 

.63- 

.60 

.36- 

.51' 


21 
22 

23 

9 A 

25 


.99 
,86 
,93 

.80 
.59 


.95 
.48 
,67 
. 47 
.27 . 


.97 - 

,67 

.80 

.66- 

.43' 


26 
27 
23 

30 


.91 
.90 
.80 
.92 
.82 


,37 
.43 
.37 
. 53 
.24 


.64- 
.67- 

.58- 
.72 

.53- 


31 

32 
33 

35 


.71 
.82 
.97 

• ' 1 

.75 


.28 
.27 
.53 
. 28 
.26 


.49^ 
.54- 
.75- 
. 50' 
.50' 


36 
37 
'38 

■ 39 
40 


.73 

.83 
.84 
,79 
.76 


.29 

.38 
/ 27 
,28 
.25 


. 51. 

.61- 

.56' 

.54- 

.51- 


41 
42 
43 
44 
45 


,99 
.98 
.82 
,94 

,77 


.94 
.91 
.46 
.61 
.19 . 


.96 ■• 
.95 • 
,64 
.77- 

.48- 



Table 3 
Point-Biserial Correlations 
Individual Test Items to 
Total Test Score 



Item Number 


Correlation 


Item Number 


Correlation 


1 


.48 


26 


.66 


2 


.37 


27 


.60 




.49 


28 


.51 


4 


.47 


29 


.63 


5 


.55 


30 


.64 


6 


.56 


31 


.47 


7 


.33 


32 


.57 


a 

u 


.39 


33 


.65 


9 


.37 


34 


.48 


10 


.44 


35 


.55 


11 


.47 


35 


.53 


12 


.59 


37 


.59 


13 


.48 


38 


.66 


14 


.55 ■ 


39 


.62 


15 


.52 


40 


.62 


16 


.50 


41 


.25 • 


17 


.51 


42 


.25 


18 


.55 


43 


..45 


19 


.45 


44 


.53 


20 


.47 


45 


.63 


21 


.27 






22 


.56 






23 


.47 






24 


.55 






25 


.43 
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Table 5 

Factor Loadings for Items in R/Ehh'-^ 



ILem Number Factor I Factor II Factor III 



i 


.49 




2 


.42 




3 


.52 




4 






5 


.52 




6 


.44 




7 






8 






9 






10 






11 




.52 


12 




.60 


13 




.43 


14 




.57 


15 




.41 


16 






17 






18 




.44 


19 






20 






21 






22 




.44 


23 




.46 


24 




.48 


25 






26 




.53 


27 




.52 


28 






29 




.58 


30 




.54 


31 


.64 




32 




.40 


33 




.51 


34 


.48 




35 


.42 




36 




.42 


37 




.49 


38 




.51 


39 1 


.42 


.43 


40 




.47 


41 • 






42 






43 






44 




.50 



Q *OnlY those loadings which were .40 or over are reported. 



Table 6 
Reliability of r/EAL in 
Target Population (KR-20) 



n = 434 S = 28.09 s = 10.36 
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